[RLlib] Use `config` (not `self.config`) in `Learner.compute_loss_for_module` to prepare these for multi-agent-capability. #45053

sven1977 · 2024-04-30T11:43:02Z

Use config (not self.config) in Learner.compute_loss_for_module to prepare these for multi-agent-capability.

Also fixes the SAC compute_gradients to NOT use DEFAULT_POLICY anymore, but to actually loop through the different RLModules.

Why are these changes needed?

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>

…ics_do_over_03_learner_on_new_metrics_logger Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/algorithms/ppo/ppo_learner.py # rllib/algorithms/ppo/tf/ppo_tf_learner.py # rllib/algorithms/ppo/torch/ppo_torch_learner.py

Signed-off-by: sven1977 <[email protected]>

simonsays1980

LGTM. Great PR! Getting multi-agent off-policy ready.

simonsays1980 · 2024-05-02T11:55:50Z

rllib/algorithms/appo/tf/appo_tf_learner.py

@@ -72,7 +72,7 @@ def compute_loss_for_module(
            trajectory_len=rollout_frag_or_episode_len,
            recurrent_seq_len=recurrent_seq_len,
        )
-        if self.config.enable_env_runner_and_connector_v2:
+        if config.enable_env_runner_and_connector_v2:


I wonder, if the new env runners work on APPO/IMPALA. In my test case they do not in the MA case where a list of episodes is tried to be compressed_if_needed

IMPALA and APPO are WIP on the new EnvRunners, officially not supported yet.

https://docs.ray.io/en/master/rllib/rllib-new-api-stack.html

simonsays1980 · 2024-05-02T11:56:42Z

rllib/algorithms/dqn/torch/dqn_rainbow_torch_learner.py

@@ -61,7 +61,7 @@ def compute_loss_for_module(
        ).squeeze()

        # Use double Q learning.
-        if self.config.double_q:
+        if config.double_q:


Great catch. I had the same changed in another PR on the side. This would have been led to some bugs in MA off-policy.

simonsays1980 · 2024-05-02T11:58:29Z

rllib/algorithms/sac/torch/sac_torch_learner.py

-                    ).items()
-                }
-            )
+        for module_id in set(loss_per_module.keys()) - {ALL_MODULES}:


Awesome! MA-ready.

…self_config_in_learner_compute_loss Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/algorithms/sac/torch/sac_torch_learner.py

wip

2c79bed

Signed-off-by: sven1977 <[email protected]>

sven1977 assigned simonsays1980 Apr 30, 2024

sven1977 marked this pull request as ready for review April 30, 2024 11:43

sven1977 requested review from avnishn, ArturNiederfahrenhorst, maxpumperla, kouroshHakha and simonsays1980 as code owners April 30, 2024 11:43

sven1977 added 2 commits April 30, 2024 13:57

LINT

769ca52

Signed-off-by: sven1977 <[email protected]>

sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Apr 30, 2024

simonsays1980 approved these changes May 2, 2024

View reviewed changes

Merge branch 'master' of https://github.com/ray-project/ray into fix_…

5a8782c

…self_config_in_learner_compute_loss Signed-off-by: sven1977 <[email protected]> # Conflicts: # rllib/algorithms/sac/torch/sac_torch_learner.py

sven1977 merged commit d069247 into ray-project:master May 2, 2024
5 checks passed

sven1977 deleted the fix_self_config_in_learner_compute_loss branch May 2, 2024 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Use `config` (not `self.config`) in `Learner.compute_loss_for_module` to prepare these for multi-agent-capability. #45053

[RLlib] Use `config` (not `self.config`) in `Learner.compute_loss_for_module` to prepare these for multi-agent-capability. #45053

sven1977 commented Apr 30, 2024 •

edited

Loading

simonsays1980 left a comment

simonsays1980 May 2, 2024

sven1977 May 2, 2024

simonsays1980 May 2, 2024

simonsays1980 May 2, 2024

[RLlib] Use config (not self.config) in Learner.compute_loss_for_module to prepare these for multi-agent-capability. #45053

[RLlib] Use config (not self.config) in Learner.compute_loss_for_module to prepare these for multi-agent-capability. #45053

Conversation

sven1977 commented Apr 30, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

simonsays1980 left a comment

Choose a reason for hiding this comment

simonsays1980 May 2, 2024

Choose a reason for hiding this comment

sven1977 May 2, 2024

Choose a reason for hiding this comment

simonsays1980 May 2, 2024

Choose a reason for hiding this comment

simonsays1980 May 2, 2024

Choose a reason for hiding this comment

[RLlib] Use `config` (not `self.config`) in `Learner.compute_loss_for_module` to prepare these for multi-agent-capability. #45053

[RLlib] Use `config` (not `self.config`) in `Learner.compute_loss_for_module` to prepare these for multi-agent-capability. #45053

sven1977 commented Apr 30, 2024 •

edited

Loading